AITopics | short-term reward

Collaborating Authors

short-term reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4038c9208dfc22644c60ad39c24e5c53-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 19:47:42 GMT

artificial intelligence, machine learning, preference vector, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.93)
Education (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning the Optimal Policy for Balancing Short-Term and Long-Term Rewards Qinwei Y ang

Neural Information Processing SystemsOct-10-2025, 00:22:21 GMT

The DPPL method is capable of obtaining optimal policies even when multiple rewards are interrelated.

objective, optimal policy, preference vector, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.93)
Education (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Long-term Off-Policy Evaluation and Learning

Saito, Yuta, Abdollahpouri, Himan, Anderton, Jesse, Carterette, Ben, Lalmas, Mounia

arXiv.org Machine LearningApr-24-2024

Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy Evaluation (LOPE), which is based on reward function decomposition. LOPE works under a more relaxed assumption than surrogacy and effectively leverages short-term rewards to substantially reduce the variance. Synthetic experiments show that LOPE outperforms existing approaches particularly when surrogacy is severely violated and the long-term reward is noisy. In addition, real-world experiments on large-scale A/B test data collected on a music streaming platform show that LOPE can estimate the long-term outcome of actual algorithms more accurately than existing feasible methods.

lope, short-term reward, variance, (16 more...)

arXiv.org Machine Learning

2404.15691

Country:

Asia > Singapore > Central Region > Singapore (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Media > Music (0.48)
Leisure & Entertainment (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Q Learning - Ashwin Vaidya

#artificialintelligenceMar-17-2019, 11:15:49 GMT

Before I explain what Q Learning is, I will quickly explain the basic principle of reinforcement learning. Reinforcement learning is a category of machine learning algorithms where the systems learn on their own by interacting with the environment. The idea is that a reward is provided to the agent if the action it takes is correct. Otherwise, some penalty is assigned to discourage the action. It is similar to how we train dogs to perform tricks, give it a snack for successfully doing a roll and rebuke it for dirtying your carpet.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback